Preparation: Please open your RStudio project and download the new data (‘Phylogeny_ConservativeCon_Checklist.zip’, ‘palms_specsxsites_phylo.csv’ and ‘palmtree_pruned.nex’) for today from Stud-IP and unzip and copy them into your .Rproj folder ‘/Data/’. You can use getwd() to locate your current working directory, which should be your project folder. Please install the following R-packages using install.packages():
Load R packages & island data set
library(plyr) # basic data manipulation
library(rgdal) # Geospatial fun
library(rgeos) # Geospatial fun
library(maptools)# Geospatial fun
library(Taxonstand) # Taxonomic standardisation of plant species names
library(ape) # Analyses of phylogenetics and evolution
library(picante) # Phylogenies and ecology
library(pez) # Phylogenies and ecology
library(psych) # basic stats stuff
library(classInt) # for plotting nice colors
library(colorRamps) # for plotting nice colors
load Multi tree object into R
Data from Faurby, S., Eiserhardt, W.L., Baker, W.J. & Svenning, J.-C. (2016).
An all-evidence species-level supertree for the palms (Arecaceae).
Molecular Phylogenetics and Evolution, 100, 57-69.
palmtrees <- read.nexus("Data/Phylogeny_ConservativeCon_Checklist.nex")
palmtrees
## 1000 phylogenetic trees
palmtrees[[1]]
##
## Phylogenetic tree with 2539 tips and 2538 internal nodes.
##
## Tip labels:
## Nypa_fruticans, Sabal_mauritiiformis, Sabal_pumos, Sabal_domingensis, Sabal_minor, Sabal_palmetto, ...
##
## Rooted; includes branch lengths.
palmtree <- palmtrees[[1]]
str(palmtree)
## List of 4
## $ edge : int [1:5076, 1:2] 2540 2541 2541 2542 2543 2544 2545 2546 2546 2547 ...
## $ edge.length: num [1:5076] 8.89 95.72 9.44 9.78 12.25 ...
## $ Nnode : int 2538
## $ tip.label : chr [1:2539] "Nypa_fruticans" "Sabal_mauritiiformis" "Sabal_pumos" "Sabal_domingensis" ...
## - attr(*, "class")= chr "phylo"
## - attr(*, "order")= chr "cladewise"
head(palmtree$tip.label)
## [1] "Nypa_fruticans" "Sabal_mauritiiformis" "Sabal_pumos"
## [4] "Sabal_domingensis" "Sabal_minor" "Sabal_palmetto"
length(palmtree$tip.label)
## [1] 2539
is.rooted(palmtree) # does the phylogeny have a root? i.e. can the most basal ancestor be identified?
## [1] TRUE
Explanation of structure of a phylogeny in R
edge: a two-column matrix where each row represents a branch (or edge) of the tree. The nodes and the tips are symbolized with integers.The n tips are numbered from 1 to n, and the m (internal) nodes from n+1 to n+m (the root being n + 1).For each row, the first column gives the ancestor.
edge.length (optional): A numeric vector giving the lengths of the branches given by edge.
tip.label: A vector of mode character giving the labels of the tips. The order of these labels corresponds to the integers 1 to n in edge.
Nnode: An integer value giving the number of nodes in the tree (m).
node.label (optional): A vector of mode character giving the labels of the nodes (ordered in the same way as the tip.label).
root.edge (optional): A numeric value giving the length of the branch at the root if it exists.
plot(palmtree,type="fan",cex=0.3, edge.color="gray70",tip.color="#ef8a62")
Data from Kreft, H., Sommer, J.H. & Barthlott, W. (2006).
The significance of geographic range size for spatial diversity
patterns in Neotropical palms. Ecography, 29, 21-30.
species <- read.csv("Data/palms_species_per_gridcell.csv",sep=",", stringsAsFactors = FALSE)
head(species)
## grid_id spp_Id GENUS EPITHET
## 1 13735 550 Ammandra natalia
## 2 13736 550 Ammandra natalia
## 3 13737 550 Ammandra natalia
## 4 13738 550 Ammandra natalia
## 5 13910 550 Ammandra natalia
## 6 13911 550 Ammandra natalia
length(unique(species$grid_id)) # number of grid cells
## [1] 6638
Tasks:
1. Add a column for the full species name with "_" as separator: “species”.
2. How many palm species occur in the Americas?
3. How many and which species in the palm distribution dataset are missing from the phylogeny?
Exercise I solutions:
Add a column for the full species name with a hyphen as separator: “species”
## [1] "Ammandra_natalia" "Ammandra_decasperma"
## [3] "Ammandra_dasyneura" "Phytelephas_tumacana"
## [5] "Phytelephas_tenuicaulis"
How many palm species occur in the Americas?
## [1] 550
How many and which species in the palm distribution dataset are missing from the phylogeny?
## [1] 80
## [1] "Ammandra_natalia" "Ammandra_dasyneura" "Geonoma_weberbaueri"
## [4] "Geonoma_rubescens" "Geonoma_polyandra"
Write species missing from phylogeny into a vector
specmissing <- unique(species$species[which(!species$species %in% palmtree$tip.label)])
specmissing <- gsub("_"," ",specmissing) # # replace underscore with a space
Match palm species names to those in The Plant List
taxstand <- TPL(specmissing, diffchar = 2, max.distance = 1)
head(taxstand)[,11:16]
## TPL.version Taxonomic.status Family New.Genus New.Hybrid.marker
## 1 1.1 Synonym Arecaceae Aphandra
## 2 1.1 Synonym Arecaceae Ammandra
## 3 1.1 Synonym Arecaceae Geonoma
## 4 1.1 Synonym Arecaceae Geonoma
## 5 1.1 Synonym Arecaceae Geonoma
## 6 1.1 Accepted Arecaceae Geonoma
## New.Species
## 1 natalia
## 2 decasperma
## 3 undata
## 4 pohliana
## 5 multisecta
## 6 poeppigiana
Replace old names by new names & overwrite the combined species name
for (i in 1:nrow(taxstand)){
species$GENUS[which(species$GENUS == taxstand$Genus[i] & species$EPITHET == taxstand$Species[i])] <- taxstand$New.Genus[i]
species$EPITHET[which(species$GENUS == taxstand$Genus[i] & species$EPITHET == taxstand$Species[i])] <- taxstand$New.Species[i]
}
species$species <- paste(species$GENUS,species$EPITHET, sep="_") # overwrite the combined species name
Check names that are not in the phylogeny
specmissing <- unique(species$species[which(!species$species %in% palmtree$tip.label)])
specmissing
## [1] "Geonoma_paraguensis" "Calyptrogyne_kunaria"
## [3] "Desmoncus_phoenicocarpus" "Desmoncus_cirrhiferus"
## [5] "Bactris_horrispatha" "Bactris_hondurensis"
## [7] "Gastrococcus_crispa" "Polyandrococcus_caudescens"
## [9] "Hypospathe_altissima" "Hypospathe_macrorachis"
## [11] "Hypospathe_elegans" "Prestoea_longipetiolata"
## [13] "Dictyocarpum_ptarianum" "Dictyocarpum_lamarckianum"
## [15] "Dictyocarpum_fuscum" "Chamaedorea_selvae"
## [17] "Chamaedorea_ernesti-angustii" "Acoelorraphe_wrightii"
## [19] "Coplothrinax_wrightii" "Coplothrinax_cookii"
## [21] "Crysophila_williamsii" "Crysophila_warscewiczii"
## [23] "Crysophila_stauracantha" "Crysophila_nana"
## [25] "Crysophila_macrocarpa" "Crysophila_kalbreyeri"
## [27] "Crysophila_guagara" "Crysophila_grayumii"
## [29] "Crysophila_cookii"
length(specmissing)
## [1] 29
species <- species[which(!species$species %in% specmissing),]
Missing species would have to be looked up and added manually to the phylogeny by an expert.
If missing species cannot be added reliably, one can remove them from the dataset for analysis.
Read in polygon shapefile
americas <- readOGR("Data/americas.shp")
## OGR data source with driver: ESRI Shapefile
## Source: "/home/dylan/Dropbox (iDiv)/Teaching/MCMMB/Data/americas.shp", layer: "americas"
## with 1 features
## It has 15 fields
## Integer64 fields read as strings: POP_CNTRY
grid <- readOGR("Data/30min_grid_select50%.shp", integer64="allow.loss")
## OGR data source with driver: ESRI Shapefile
## Source: "/home/dylan/Dropbox (iDiv)/Teaching/MCMMB/Data/30min_grid_select50%.shp", layer: "30min_grid_select50%"
## with 6038 features
## It has 3 fields
## Integer64 fields read as signed 32-bit integers: ID
names(grid@data)[1] <- "grid_id"
Convert the grid shapefile to a coarser resolution to have fewer grid cells
grid_centroids <- gCentroid(grid, byid=TRUE) # make spatial points dataframe from grid
grid_centroids <- SpatialPointsDataFrame(grid_centroids, data = data.frame(grid_centroids@coords))
names(grid_centroids@data) <- c("longitude","latitude")
par(mfrow=c(1,1))
plot(grid_centroids)
plot(americas, add=TRUE)
Add coordinates to SpatialPolygonsDataframe
grid@data <- cbind(grid@data,grid_centroids@data)
head(grid@data)
## grid_id AREA PORTION longitude latitude
## 0 538 0.25 100 -114.2507 36.25
## 1 539 0.25 100 -113.7507 36.25
## 2 540 0.25 100 -113.2507 36.25
## 3 712 0.25 100 -114.7507 35.75
## 4 713 0.25 100 -114.2507 35.75
## 5 714 0.25 100 -113.7507 35.75
Make new IDs for neighbouring cells
range(grid@data$longitude)
## [1] -115.75066 -35.25066
range(grid@data$latitude)
## [1] -34.75 36.25
longitude <- seq(-116,-34, by=2)
latitude <- seq(-35,35, by=2)
new_ID_matrix <- matrix(c(1:(length(longitude)*length(latitude))), nrow=length(longitude), ncol=length(latitude) , dimnames = list(longitude, latitude), byrow=TRUE)
grid@data$new_ID <- NA
for(i in 1:nrow(grid@data)){
grid@data$new_ID[i] <- new_ID_matrix[which(longitude < grid@data$longitude[i] & (longitude + 2) > grid@data$longitude[i]), which(latitude < grid@data$latitude[i] & (latitude + 2) > grid@data$latitude[i])]
}
head(grid@data)
## grid_id AREA PORTION longitude latitude new_ID
## 0 538 0.25 100 -114.2507 36.25 36
## 1 539 0.25 100 -113.7507 36.25 72
## 2 540 0.25 100 -113.2507 36.25 72
## 3 712 0.25 100 -114.7507 35.75 36
## 4 713 0.25 100 -114.2507 35.75 36
## 5 714 0.25 100 -113.7507 35.75 72
Join neighbouring cells based on new IDs & aggregate species distribution data using the new grid
grid_2degrees <- unionSpatialPolygons(grid, grid@data$new_ID)
uniqueIDs <- unique(grid@data$new_ID)
grid_2degrees <- SpatialPolygonsDataFrame(grid_2degrees, data=data.frame(new_ID=uniqueIDs, row.names = uniqueIDs))
head(grid_2degrees@data)
## new_ID
## 1009 1009
## 1010 1010
## 1011 1011
## 1012 1012
## 1013 1013
## 1014 1014
plot(grid_2degrees)
plot(americas, add=TRUE)
species <- join(species, grid@data, by="grid_id", type="inner")
head(species)
## grid_id spp_Id GENUS EPITHET species AREA PORTION longitude
## 1 13735 550 Aphandra natalia Aphandra_natalia 0.25 100 -78.25066
## 2 13736 550 Aphandra natalia Aphandra_natalia 0.25 100 -77.75066
## 3 13737 550 Aphandra natalia Aphandra_natalia 0.25 100 -77.25066
## 4 13738 550 Aphandra natalia Aphandra_natalia 0.25 100 -76.75066
## 5 13910 550 Aphandra natalia Aphandra_natalia 0.25 100 -78.25066
## 6 13911 550 Aphandra natalia Aphandra_natalia 0.25 100 -77.75066
## latitude new_ID
## 1 -1.25 665
## 2 -1.25 701
## 3 -1.25 701
## 4 -1.25 701
## 5 -1.75 665
## 6 -1.75 701
Prune phylogeny to exclude species not in palm distribution data
Make sure that the same number of species in your data set are in your phylogeny
palmtree_pruned <- drop.tip(palmtree,palmtree$tip.label[which(!palmtree$tip.label %in% unique(species$species))])
length(palmtree_pruned$tip.label) #
## [1] 498
length(unique(species$species))
## [1] 498
write.nexus(palmtree_pruned, file="Data/palmtree_pruned.nex")
Convert long table to species by grid-cell table (a community matrix)
species <- unique(species[,c("new_ID","species")])
species_grid <- as.data.frame.matrix(table(species))
species_grid<-as.matrix(species_grid)
write.csv(species_grid, file = "Data/palms_specsxsites_phylo.csv")
Calculate phylogenetic diversity
palmtree_pruned<-read.nexus("Data/palmtree_pruned.nex") # if you have a hard time reading in the large palm phylogeny
species_grid <- as.matrix(read.csv("Data/palms_specsxsites_phylo.csv", row.names = 1)) # in case you got stuck above
pd_palms <- pd(species_grid, palmtree_pruned)
head(pd_palms)
## PD SR
## 32 140.0006 2
## 33 140.0006 2
## 34 104.6140 1
## 35 104.6140 1
## 36 104.6140 1
## 67 140.0006 2
pd() calculates Faith’s PD and a corrected version of it being the residuals of a regression with total abundance or species richness per plot. Lets look into this:
pd_model <- lm(PD ~ SR, data = pd_palms)
pd_palms$residuals <- residuals(pd_model)
par(mfrow=c(1,2))
plot(pd_palms$SR, pd_palms$PD, main="Species richness vs Faith's PD", xlab="Species number", ylab="Faith's PD", cex.main=0.8,cex.axis=0.8)
abline(pd_model)
plot(pd_palms$SR, pd_palms$residuals, main="Species richness vs Faith's PD residuals", xlab="Species number", ylab="Faith's PD residuals", cex.main=0.8,cex.axis=0.8)
Calculate generic phylogenetic metrics (absolute and standardized)
Pphylogenetic diversity metrics are standardized by first using a null model to generate expected values.
Then, the observed value is compared to the null, usually using standardized effect sizes:
\(SES = (observed - mean.null)/sd.null\), where mean.null is the mean of the null values and sd.null is the standard deviation of the null values.
c.data <- comparative.comm(palmtree_pruned , species_grid)
pd.null<-generic.null(c.data, c(.pd,.mpd,.mntd),null.model = "richness",comp.fun=.ses,permute=100)
colnames(pd.null) <- c("Faith_PD","Corrected_FaithPD","MPD", "MNTD")
rownames(pd.null) <- sites(c.data)
pd.null<-data.frame(pd.null)
pd.null$new_ID<-as.numeric(rownames(pd.null))
pd.out <- pd.null[,c(21, 1,13,3,15,4,16)] # select columns for final data set
pd.out [is.na(pd.out )] <- 0 # change NAs to zeroes; no phylogenetic diversity if grid has only 1 spp!
par(mfrow=c(1,2))
plot(pd_palms$SR, pd_palms$residuals, main="Species richness vs Faith's PD residuals", xlab="Species number", ylab="Faith's PD residuals", cex.main=0.8,cex.axis=0.8)
plot(pd.out$Faith_PD.observed, pd.out$Faith_PD.SES,main="Faith's PD vs standardised Faith's PD",cex.main=0.8,xlab="observed Faith's PD",ylab="standardised Faith's PD",cex.axis=0.8)
Tasks:
1. Create a data.frame called ‘palmdiversity’ with species richness, PD and residual PD from ‘pd_palms’ and standardized pd metrics from ‘pd.out’.
2. Plot correlations among the diversity metrics.
3. Join the ‘palmdiversity’ data.frame to the grid_2degrees shapefile to be able to link the diversity metrics and the poylgons for plotting. 4. Plot maps with colours according to the diversity metrics.
Exercise II solutions:
Create a data.frame called ‘palmdiversity’ by joining ‘pd_palms’ with ‘pd.out’
## new_ID PD SR residuals Faith_PD.observed Faith_PD.SES
## 1 32 140.0006 2 -96.80594 140.0006 -15.91277
## 2 33 140.0006 2 -96.80594 140.0006 -13.84592
## 3 34 104.6140 1 -112.12659 104.6140 -15.55744
## 4 35 104.6140 1 -112.12659 104.6140 -15.48190
## 5 36 104.6140 1 -112.12659 104.6140 -12.01928
## 6 67 140.0006 2 -96.80594 140.0006 -16.27082
## MPD.observed MPD.SES MNTD.observed MNTD.SES
## 1 70.7732 -15.91277 70.7732 -15.91277
## 2 70.7732 -13.84592 70.7732 -13.84592
## 3 0.0000 0.00000 0.0000 0.00000
## 4 0.0000 0.00000 0.0000 0.00000
## 5 0.0000 0.00000 0.0000 0.00000
## 6 70.7732 -16.27082 70.7732 -16.27082
Plot correlations among the diversity metrics
Join the palmdiversity data.frame to the shapefile with the new gridcell IDs
## new_ID PD SR residuals Faith_PD.observed Faith_PD.SES
## 1 1009 104.6140 1 -112.126586 104.6140 -16.320707
## 2 1010 190.8985 2 -45.908126 190.8985 8.548754
## 3 1011 212.9137 3 -43.958851 212.9137 -6.371583
## 4 1012 160.0228 2 -76.783808 160.0228 -7.815616
## 5 1013 252.0204 3 -4.852128 252.0204 6.280482
## 6 1014 329.4444 5 32.439925 329.4444 -2.904875
## MPD.observed MPD.SES MNTD.observed MNTD.SES
## 1 0.0000 0.000000 0.00000 0.000000
## 2 172.5688 8.548754 172.56882 8.548754
## 3 129.7227 -2.677800 86.87658 -9.078391
## 4 110.8175 -7.815616 110.81746 -7.815616
## 5 155.7938 6.673172 139.01887 5.794896
## 6 142.3322 2.984331 88.67323 -5.864386
Plot maps with different colours for each diversity metric
Species richness, PD, MPD & MNTD (absolute)
PD, MPD & MNTD (standardized)